Národní úložiště šedé literatury Nalezeno 8 záznamů.  Hledání trvalo 0.00 vteřin. 
Improving Robustness of Speaker Recognition using Discriminative Techniques
Novotný, Ondřej ; Ferrer, Luciana (oponent) ; Pollák, Petr (oponent) ; Černocký, Jan (vedoucí práce)
This work deals with discriminative techniques in speaker verification systems to improve robustness of the systems against factors that negatively affect their performance. These factors include noise, reverberation, or the transmission channel. The thesis consists of two main parts. In the first part, it deals with a theoretical introduction to current state-of-the-art speaker verification systems. The recognition system's steps are described, starting from the extraction of acoustic features, the extraction of vector representations of recordings, and the final recognition score computation. Particular emphasis is paid to the techniques of extraction of a vector representation of a recording, where we describe two different paradigms: the i-vectors and the x-vectors. The second part of the work focuses more on discriminative techniques to increase robustness. Their description is organized to match the gradual passage of the recording through the verification system. First, attention is paid to signal pre-processing using a neural network for noise reduction and speech enhancement. This pre-processing is a universal technique independent of the verification system. The work follows by focusing on the use of a discriminative approach in the extraction of features and the extraction of vector representations of recordings. Furthermore, this work sheds light on the transition from generative systems to discriminative systems. In order to give a fuller context, the work also describes techniques that had historically preceded this transition. All presented techniques are always experimentally verified and their advantages evaluated. We are proposing several techniques that have proved successful in both the generative approach in the form of i-vectors and discriminative x-vectors, and thanks to them, considerable improvement has been achieved. For completeness, in the field of robustness, other techniques are included in the work, such as normalization of scores or multi-condition training. Finally, the work deals with the robustness of discriminative systems in terms of data used in their training.
Hledání nových cest v rozpoznávání řečníka založeného na neuronových sítích
Sova, Damián ; Matějka, Pavel (oponent) ; Glembek, Ondřej (vedoucí práce)
Keďže zadanie tejto práce je veľmi široké, tak sa bolo treba sústrediť len na určitú sféru. Nakoniec, cieľom tejto práce je aplikovať optimalizačnú metódu Stochastického Spriemerovania Váh do tréningového procesu Hlbokej Neurónovej Siete. Po predstavení potrebných teoretických vedomostí v prvej časti práce, nasleduje druhá časť s priebehmi jednotlivých experimentov. V teoretickej časti je dôraz kladený hlavne na objasnenie celého životného cyklu trénovacieho a vyhodnocovacieho procesu, vrátane popisu jednotlivých komponentov. Praktická časť poskytuje podrobný pohľad na každý experiment, ktorých cieľom je demonštrovať dosiahnuteľnosť zvýšenia výkonnosti systému rozpoznávania rečníka. Celkové zlepšenie výkonu sa podarilo dosiahnuť postupným aplikovaním rôznych tréningových konfigurácií, v ktorých sa zohľadňujú skúsenosti z predchádzajúcich experimentov. Kľúčovou zložkou úspešného Stochastického Spriemerovania Váh v experimentoch bola dostatočne vysoká konštantná hodnota Miery Učenia s aplikovaným postupným prechodom alebo Cyklický priebeh Miery Učenia.
Penetrační testy systému pro verifikaci řečníka
Nguyen, QuangTrang ; Rohdin, Johan Andréas (oponent) ; Plchot, Oldřich (vedoucí práce)
Cílem bakalářské práce je návrhnout sadu penetračních testů pro verifikaci řečníka s použítím syntézy řeči a dostupných nahrávek cílových mluvčí. Práce zahrnuje studium problematiky pro syntézu řeči, verifikace řečníka a metod pro spoofing se kterými můžeme setkat. Před samotným návrhem testovací sady je popsán systém a jeho komponenty, který byl použít v této práci. V posledních kapitolách práce je uveden popis návrhu testovacích sad a způsob realizace testů. Na závěru jsou vyhodnoceny výsledky a je odpovězeno na otázku, zda je možné prolomit systém pro verfikaci řečníka s využitím metody pro syntézu řeči.
Non-Parallel Voice Conversion
Brukner, Jan ; Plchot, Oldřich (oponent) ; Černocký, Jan (vedoucí práce)
Voice conversion (VC) aims at converting the voice of source speaker to the voice of target speaker. It is popular in funny Internet videos but has also series of serious use cases, such as dubbing of audiovisual material and anonymization of voice (for example for witness protection). As it can serve for spoofing of voice identification systems, it is also an important tool for development spoofing detectors and counter-measures.     Training VC models has mainly been on parallel audios (ie. two speakers uttering the same text) and on high quality audio material. The goal of this thesis was to investigate developing VC on non-parallel data and with low quality signals, mainly from publicly available dataset VoxCeleb.  This work follows the state-of-the-art AutoVC architecture defined by Qian et al. It is based on neural network (NN) autoencoders, aiming to separate speech into content- and speaker-dependent embedding. The target speech is then obtained by replacing source speaker embedding by the target speaker one. We have improved Qian's architecture to process low-quality audio by experimenting with different speaker embeddings (d-vectors vs. x-vectors), introducing a speaker classifier from content embeddings in an adversarial setup, and tuning the size of content embeddings imposing an information bottleneck to the autoencoder. Also, we have defined another adversarial architecture by comparing original content embeddings with those obtained after the VC process. The results of experiments prove that non-parallel VC on low-quality data is indeed doable. The resulting audios were not so good as in case of using high-quality ones, but the speaker verification results after spoofing by proposed system have clearly shown a shift of voice characteristics toward the target speakers.
Hledání nových cest v rozpoznávání řečníka založeného na neuronových sítích
Sova, Damián ; Matějka, Pavel (oponent) ; Glembek, Ondřej (vedoucí práce)
Keďže zadanie tejto práce je veľmi široké, tak sa bolo treba sústrediť len na určitú sféru. Nakoniec, cieľom tejto práce je aplikovať optimalizačnú metódu Stochastického Spriemerovania Váh do tréningového procesu Hlbokej Neurónovej Siete. Po predstavení potrebných teoretických vedomostí v prvej časti práce, nasleduje druhá časť s priebehmi jednotlivých experimentov. V teoretickej časti je dôraz kladený hlavne na objasnenie celého životného cyklu trénovacieho a vyhodnocovacieho procesu, vrátane popisu jednotlivých komponentov. Praktická časť poskytuje podrobný pohľad na každý experiment, ktorých cieľom je demonštrovať dosiahnuteľnosť zvýšenia výkonnosti systému rozpoznávania rečníka. Celkové zlepšenie výkonu sa podarilo dosiahnuť postupným aplikovaním rôznych tréningových konfigurácií, v ktorých sa zohľadňujú skúsenosti z predchádzajúcich experimentov. Kľúčovou zložkou úspešného Stochastického Spriemerovania Váh v experimentoch bola dostatočne vysoká konštantná hodnota Miery Učenia s aplikovaným postupným prechodom alebo Cyklický priebeh Miery Učenia.
Improving Robustness of Speaker Recognition using Discriminative Techniques
Novotný, Ondřej ; Ferrer, Luciana (oponent) ; Pollák, Petr (oponent) ; Černocký, Jan (vedoucí práce)
This work deals with discriminative techniques in speaker verification systems to improve robustness of the systems against factors that negatively affect their performance. These factors include noise, reverberation, or the transmission channel. The thesis consists of two main parts. In the first part, it deals with a theoretical introduction to current state-of-the-art speaker verification systems. The recognition system's steps are described, starting from the extraction of acoustic features, the extraction of vector representations of recordings, and the final recognition score computation. Particular emphasis is paid to the techniques of extraction of a vector representation of a recording, where we describe two different paradigms: the i-vectors and the x-vectors. The second part of the work focuses more on discriminative techniques to increase robustness. Their description is organized to match the gradual passage of the recording through the verification system. First, attention is paid to signal pre-processing using a neural network for noise reduction and speech enhancement. This pre-processing is a universal technique independent of the verification system. The work follows by focusing on the use of a discriminative approach in the extraction of features and the extraction of vector representations of recordings. Furthermore, this work sheds light on the transition from generative systems to discriminative systems. In order to give a fuller context, the work also describes techniques that had historically preceded this transition. All presented techniques are always experimentally verified and their advantages evaluated. We are proposing several techniques that have proved successful in both the generative approach in the form of i-vectors and discriminative x-vectors, and thanks to them, considerable improvement has been achieved. For completeness, in the field of robustness, other techniques are included in the work, such as normalization of scores or multi-condition training. Finally, the work deals with the robustness of discriminative systems in terms of data used in their training.
Penetrační testy systému pro verifikaci řečníka
Nguyen, QuangTrang ; Rohdin, Johan Andréas (oponent) ; Plchot, Oldřich (vedoucí práce)
Cílem bakalářské práce je návrhnout sadu penetračních testů pro verifikaci řečníka s použítím syntézy řeči a dostupných nahrávek cílových mluvčí. Práce zahrnuje studium problematiky pro syntézu řeči, verifikace řečníka a metod pro spoofing se kterými můžeme setkat. Před samotným návrhem testovací sady je popsán systém a jeho komponenty, který byl použít v této práci. V posledních kapitolách práce je uveden popis návrhu testovacích sad a způsob realizace testů. Na závěru jsou vyhodnoceny výsledky a je odpovězeno na otázku, zda je možné prolomit systém pro verfikaci řečníka s využitím metody pro syntézu řeči.
Non-Parallel Voice Conversion
Brukner, Jan ; Plchot, Oldřich (oponent) ; Černocký, Jan (vedoucí práce)
Voice conversion (VC) aims at converting the voice of source speaker to the voice of target speaker. It is popular in funny Internet videos but has also series of serious use cases, such as dubbing of audiovisual material and anonymization of voice (for example for witness protection). As it can serve for spoofing of voice identification systems, it is also an important tool for development spoofing detectors and counter-measures.     Training VC models has mainly been on parallel audios (ie. two speakers uttering the same text) and on high quality audio material. The goal of this thesis was to investigate developing VC on non-parallel data and with low quality signals, mainly from publicly available dataset VoxCeleb.  This work follows the state-of-the-art AutoVC architecture defined by Qian et al. It is based on neural network (NN) autoencoders, aiming to separate speech into content- and speaker-dependent embedding. The target speech is then obtained by replacing source speaker embedding by the target speaker one. We have improved Qian's architecture to process low-quality audio by experimenting with different speaker embeddings (d-vectors vs. x-vectors), introducing a speaker classifier from content embeddings in an adversarial setup, and tuning the size of content embeddings imposing an information bottleneck to the autoencoder. Also, we have defined another adversarial architecture by comparing original content embeddings with those obtained after the VC process. The results of experiments prove that non-parallel VC on low-quality data is indeed doable. The resulting audios were not so good as in case of using high-quality ones, but the speaker verification results after spoofing by proposed system have clearly shown a shift of voice characteristics toward the target speakers.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.